## Performance Analysis of Multiplane, Nonblocking ATM Switches \* Christos Kolias and Leonard Kleinrock Department of Computer Science University of California at Los Angeles Los Angeles, CA 90095-1596 U.S.A. ### Abstract Multiplane switches present a special interest in the field of ATM switching as they are high-performance scaleable architectures. Not only are they efficient in terms of performance but they also offer a high-degree of reliability, an issue which is very critical to ATM networks. In this paper we present a number of different switching architectures based on multiplane designs. We study and analyze them under various performance metrics, such as throughput, mean waiting time and cell drop probability, for which we derive exact analytical and numerical expressions, given a homogeneous traffic model. We elaborate also on their implementation suitability and the various tradeoffs that can emerge. #### 1 Introduction ATM switches have gained a lot of importance as they assume additional capabilities and functionality, beside their multiplexing and switchign tasks, in implementing and maintaining ATM networks. Such capabilities include traffic management, flow and congestion control and signaling. ATM switches can also host various network interfaces such as the public and private NNIs, B-ICI to mention a few. Deploying a scaleable, long-haul ATM network requires that the switches themselves, as the intermediate systems across the network, are scaleable and expandable as well. This has led to the design of various switching architectures. The choice as to which architecture to implement is made based on a wide set of selection and feasibility criteria such as the services (and quality) provided - which depend on the applications' requirements, the performance delivered, the bandwidth demands, reliability features and certainly cost. These must cater to the network designer's, administrator's and users' needs and desires. For instance, we mention that backbone switches (it is expected that ATM switches will mainly act as backbone switches) should be characterized by scaleability (ability to accept additional physical links) and increased reliability (in order to cope with network failures) without actually sacrificing any performance or requiring a much higher growth in complexity and consequently cost. Multiplane switches, also called parallel switches constitute a class of high-speed switching paradigms that certainly fall in the above category of switches. They are entirely scaleable architectures which make them also very cost-effective as they require minimal and quite simple hardware modifications. # 2 Multiplane Switching Architectures Multiplane switches are constructed from multiple identical, space-division, non-blocking crossbar switch fabrics that operate in parallel and are topologically bundled and fabricated together forming the switch's core. Multiplane switches are thought as a good compromise between performance and cost. Without being prohibitively expensive they offer a high-level of performance. Various architectures can emerge depending on the network designer's views and requirements. As an example, on which we elaborate later in this paper, we mention the need for redundancy within the switch. Switch redundancy calls normally for the availability of alternate paths (connections between the switch's inputs and outputs) should a software or hardware component fail (e.g., a switching plane) as to ensure the continuous flow across, i.e., a backbone network. This concern gives rise to the issue of switch reliability. Unfortunately, additional redundancy comes in the form of supplemental hardware and, or, software which of course translates into higher manufacturing costs. The switch redundancy enhances the redundancy and fault-tolerance along an established ATM connection. As we know this is very important in ATM networks since they are connection-oriented networks and there is no dynamic (re-)routing in the face of failures. Other related issues that should be taken into consideration when, generally, designing and building a switch are the switch's scaleability and feasibility for VLSI implementation. Multiplane switches offer a certain degree of scaleability due to their modular design.<sup>1</sup> <sup>\*</sup>Web URL: http://millennium.cs.ucla.edu/~ck <sup>&</sup>lt;sup>1</sup>Another large class of switches are the multistage interconnection networks (MINs), which are also constructed in a modular fashion (therefore they are also considered scaleable architectures). We can argue that the actual difference, in terms of scaleability, between these two classes, namely the multiplane switches and the MINs, is that the former offer scaleability (and expandability) in the "depth" dimension (stacked-planes architecture) which leads to a performance enhancement, while in the latter scaleability is im- Multiplane switching architectures have been proposed also as an alternative to tackle the head-of- line (HOL) problem.<sup>2</sup> They nomrally employ both input and output queueing. Queues on inputs are necessary since the HOL problem is not completely eliminated. After traversing the switching plane cells are collected and queued at the outputs (i.e., output queues) since multiple cells (from different planes) might request the same output within the frame of a timeslot and only can be switched out on an output link. All queues maintain a FIFO discipline. Multiplane switches are offered as an alternative to switches with speed-up [6], [7]. In multiplane switches the speed of the switch fabric matches the speed of the input and output links. The existence of parallel switching planes allows for multiple cells (from different inputs) destined to the same output to be switched simultaneously and independently. Thus multiplane switches perform a similar function as single-plane switches which they operate in multiple speeds of the output and input trunks in order to transfer a batch of cells from different inputs to the same output within the same timeslot. A very important feature of multiplane switches is, as we have already indicated, the offered redundancy: should one or more planes fail the remaining replicate planes can accommodate the additional load. Planes are used in parallel and concurrently, rather than on a standby basis, in order to evenly distribute the input load. In what follows we examine a number of different multiplane-based schemes and comment on their differences and tradeoffs. We give exact results with respect to their performance as this is measured in terms of the maximum throughput $\gamma_{max}$ achieved by the switch (output ports), mean delay T within the switch and cell drop (due to lack of buffer storage) probability $P_B$ . # 3 Modeling Approach Since ATM networks are based on the structure of fixed-size packets called cells, ATM switches are modeled as time-slotted, multi-queue, multi-server queueing systems, where one timeslot represents the time to transmit a cell. Therefore each switching plane in a multiplane switch is modeled as a discrete-time queueing system. The multiplane switching architectures we consider have N inputs and N outputs with N arbitrarily large (theoretically $N \to \infty$ ). We also assume that they consist of m parallel switching planes.<sup>3</sup> We assume that each input line carries the same traffic load $\lambda$ , which also denotes the utilization of each input port. An incoming cell arrives at an input, at the beginning of an arbitrary time slot, with probability $\lambda$ (inde- pendent Bernoulli arrivals) and is uniformly destined to any output port, namely with probability 1/N. Both the balanced traffic and destination uniformity assumptions render a homogeneous switch model. Any cells that arrive at the HOL positions are called *fresh* cells. The multiple planes are configured in such a way that the traffic load is equally distributed among all m switching planes. No special scheduling algorithms are implemented or considered as to which planes cells are routed through; choices are made on a rather random basis. An output queue is shared by all m planes. An incoming cell that enters one of the (FIFO) input queues waits until it is routed through one of m planes to its desired output queue. Fig. 1 portrays the tandem queueing system an incoming cell goes through while it is in the multiplane switch. Let us assume for the moment that both input and output queues are infinitely large, thus we consider a noloss system. The overall switch throughput is denoted by $\gamma$ where at steady-state $\gamma = \lambda$ . We are interested in finding the maximum throughput $\gamma_{max}$ which occurs at saturation (i.e., queue size and waiting time become infinitely large). Therefore $$\gamma_{max} = \begin{cases} \lambda, & \lambda < \gamma_{max} \\ \gamma_{max}, & \lambda \ge \gamma_{max} \end{cases}$$ (1) We should point out that it is the input queues, due to the HOL blocking, that present a limiting factor in throughput.[2] The minimum time spent<sup>4</sup> by a typical cell while it is in the switch is two timeslots: one for being routed through the switching plane and one for being transmitted on its selected output trunk, thus T > 2. Figure 1: A representation of the tandem queueing system and the waiting times that an incoming cell experiences as it traverses a multiplane switch. The total waiting time W that a cell experiences as it travels through the switch consists of two components: $W_1$ - which represents the waiting time while the cell is buffered in its input queues plus the fixed time (equal to one timeslot) for the cell to be switched (through a switching plane) to its corresponding output buffer and $W_2$ - which denotes the waiting time the selected HOL cell spends queued in the output buffer awaiting for its turn to be transmitted on the output trunk. Hence $$W = W_1 + W_2 \tag{2}$$ The total delay is simply $T = \overline{W} + 1$ , since it takes only one timeslot to transmit a cell out from its final destination part An input queue is modeled as a $Geom(\lambda)/G/1$ queueing system where G characterizes the time spent by a cell while it resides in its HOL position (until it is routed). We plemented in the width dimension (more stages) as to accommodate more ports (and thus connections), given that we can view switches as a 3-D architectures. <sup>&</sup>lt;sup>2</sup>HOL blocking refers to the situation where more than one HOL cells (cells at the head of their queues) contend for the same output port and try to access the same switching plane - only up to one HOL cell can be routed. Those HOL cells that lost the contention remain at their positions blocking therefore any cells that are queued behind them, assuming a FIFO service order. <sup>&</sup>lt;sup>3</sup>For m=1 we get the simple input-queueing crossbar switch [2], thus generally $m \geq 2$ . <sup>&</sup>lt;sup>4</sup>Which refers to the case where there is no queueing whatsoever. refer to that delay (which includes the fixed transmission time) as the HOL holding time. Since we assumed that the cell arrivals are i.i.d. and the output port destinations are uniformly distributed then we can assume that the HOL holding times are i.i.d. as well. We consider an early arrival model for the Geom/G/1 queueing system: departures occur right before the time boundary (at the end of the timeslot) while arrivals immediately after the time boundary (at the beginning of the timeslot). Clearly then $W_1$ is equal to the delay of a Geom/G/1 queue. Its mean value is given by (cf. [5]): $$\overline{W_1} = T_{Geom/G/1} = \overline{S} + \frac{\lambda(\overline{S^2} - \overline{S})}{2(1 - \lambda \overline{S})}$$ (3) where $\overline{S}$ and $\overline{S^2}$ are the first and second moments, respectively, of the service time (HOL holding time) in the $Geom(\lambda)/G/1$ . Each output queue, which can potentially receive cells from all m planes, then behaves essentially as a $Geom^X/D/1$ bulk arrival (discrete time) queueing system. For the $Geom^X/D/1$ batch arrival we assume also an early arrival model with immediate access: if a batch of cells (from different planes) arrive at an empty output queue one of them (randomly chosen) is transmitted out on the output trunk while the remaining cells are queued. Thus, $W_2$ is merely the waiting time in an $Geom^X/D/1$ queue. In [4] we derive the mean waiting time for the $Geom^X/D/1$ queue which is (note that this is not affected by the service order as long as the latter is impartial): $$\overline{W_2} = \frac{\gamma(1 - \frac{1}{m})}{2(1 - \gamma)} \tag{4}$$ where $\gamma$ is the effective arrival rate to an output queue (each plane contributes simply $\gamma/m$ ) If we now assume that both input and output queues have a finite storage capacity, i.e., $K_1$ and $K_2$ respectively, then we would be interested in finding the switch's cell drop probability.<sup>5</sup> Cell loss, inside a multiplane switch, occurs in two cases: (i) when there is no buffer space to store a new incoming cell at the input queue to which it arrives, and (ii) when a HOL cell is denied access to an output queue, after it has traversed one of the switching planes. In the latter case we assume no backpressure; that is if there is no room in the output queue the cells are simply discarded rather than remain in their HOL positions. The switch's overall cell drop probability is given as $$P_B = 1 - (1 - P_{B_1})(1 - P_{B_2}) \tag{5}$$ where $P_{B_1}$ and $P_{B_2}$ are the blocking probabilities for the input and output queues respectively. The aggregate output rate from the multiplane switches is now expressed (see Fig. 2) as $$\lambda' = \lambda(1 - P_{B_1}) \tag{6}$$ while the switch's throughput is $$\gamma = \lambda'(1 - P_{B_2}) = \lambda(1 - P_B) \tag{7}$$ Figure 2: Cell loss in a multiplane switch. In order to obtain the input-queue size distribution and assuming that $\lambda$ is the arrival rate we model an input queue as a $Geom(\lambda)/Geom(q)/1/K_1$ queue (cf. [1], [5]), which is an approximate model to the $Geom/G/1/K_1$ queue. Note that q is the probability that a HOL cell is served (i.e., transferred to its output queue) at the beginning of the timeslot. This HOL departure probability is expressed as the reciprocal of the mean HOL holding time $\overline{S}$ , i.e., $q=1/\overline{S}$ and is a function of $\lambda$ (cf. [5]). A finite-capacity output queue is modeled as a $Geom^X/D/1/K_2$ queue. In [4] we give some important results with reference to the Geom/Geom/1 and $Geom^X/D/1$ (finite and infinite capacity) queueing systems. We will occasionally use some of the results derived there to carry out our performance study in this paper. # 4 Scheme A - Single Input Queues Figure 3: A 4 $\times$ 4 scheme A ATM Switch with m=2 switching planes. We consider a multiplane switch, which we call Scheme A, where each input port has a single queue that feeds all m planes (each plane receives a total load of $\frac{\lambda N}{m}$ , at non-saturation). Figure 3 shows the case where m=2. A HOL cell randomly chooses one of the m planes to be routed through. Therefore, there are a total of m lines that connect an input queue to the switching planes. The obvious merit of this scheme is redundancy: should a plane fail then any of the remaining planes can still be used. We can modify (and in a way generalize) scheme A as follows: let n (instead of m) be the number of links between an input queue and the switching planes, in such a way that an input queue feeds only n (of the m) planes and that all the planes have the same number of inputs (i.e., under a balanced traffic and due to the random selection all planes then receive the same input load). Note that now each switching plane is an $\frac{nN}{m} \times N$ crossbar switch (still offered a total traffic load of $\frac{\lambda N}{m}$ ), where <sup>&</sup>lt;sup>5</sup>One can easily modify the results we derive to account for infinite only input or output queues. $n \leq m$ . We claim that this generic multiplane switch has a maximum throughput of $\gamma = 1 + m - \sqrt{1 + m^2}$ . We will show this result by considering the mean waiting time for an input queue which is given by Eq. (3). We notice that as $\overline{W_1} \to \infty$ then the input queue attains its maximum throughput which is $\gamma_{max}$ meaning that the denominator of Eq. (3) is zero yielding $$\gamma_{max} = \frac{1}{\overline{S}} \tag{8}$$ Let $A_{j,k}$ be the number of the fresh HOL cells destined to the j-th output to be routed through the k-th plane. Then we can show, as in [2], that for $N \to \infty$ , $A_{j,k}(z)$ becomes the z-transform of a Poisson process with parameter $\lambda/m$ (independent of n), which means that the arrival process of all the fresh HOL cells destined to the same output j, namely $A_j = \sum_{k=1}^m A_{j,k}$ , is Poisson<sup>6</sup> with parameter $\lambda$ . It is evident then that the HOL holding time is equal to the delay in a discrete $M(\lambda/m)/D/1$ queue with random order of service (ROS), thus $$\overline{S} = \overline{W}_{M/D/1} + 1 = \frac{\lambda/m}{2(1-\lambda/m)} + 1 = \frac{2m-\lambda}{2(m-\lambda)}$$ (9) Applying Eq. (9) in Eq. (8) and solving for $\lambda$ yields $$\gamma_{max} = 1 + m - \sqrt{1 + m^2} \tag{10}$$ Throughput is then expressed as $$\gamma = \begin{cases} \lambda, & \lambda < 1 + m - \sqrt{1 + m^2} \\ 1 + m - \sqrt{1 + m^2}, & \lambda \ge 1 + m - \sqrt{1 + m^2} \end{cases}$$ (11) It should be noted that $\lim_{m\to\infty} \gamma_{max} = 1$ , which is the same throughput as in ATM switches with pure output queueing [2]. We now proceed with the calculation of the mean waiting time for a cell entering the multiplane switch. With respect to the discrete (ROS) M/D/1 queue we have from [5] (after properly substituting $\lambda/m$ for $\lambda$ for our case here) that $$\overline{S^2} = \frac{5\lambda^3 + m\lambda^2 - 12m^2\lambda + 12m^3}{6(m-\lambda)^2(2m-\lambda)}$$ (12) From Eqs. (3), (9) and (12) we get $$\overline{W_1} = \frac{11\lambda^4 - (32m+6)\lambda^3 + (48m+30)m\lambda^2 - \frac{6(m-\lambda)(2m-\lambda)}{-(24m+48)m^2\lambda + 24m^4}}{\frac{(\lambda^2 - 2(1+m)\lambda + 2m)}{(\lambda^2 - 2(1+m)\lambda + 2m)}}$$ $\overline{W_2}$ is merely given by Eq. (4) where $\gamma$ is as in Eq. (11), since the input rate to an output buffer is equal to the aggregate throughput of the switching planes. Both $\overline{W_1}$ and $\overline{W_2}$ are independent of n, hence is also the mean total waiting time $\overline{W}$ . As throughput attains its maximum (i.e., when $\lambda \to 1 + m - \sqrt{1 + m^2}$ ) then $\overline{W_1} \to \infty$ , thus $\overline{W} \to \infty$ . As $m \to \infty$ then $\lim_{m \to \infty} \overline{W_1} = 1$ and $$\lim_{m\to\infty} \overline{W_2} = \overline{W}_{M/D/1} = \frac{\lambda}{2(1-\lambda)}$$ ; as a result $$\lim_{m \to \infty} \overline{W} = \frac{2 - \lambda}{2(1 - \lambda)} \tag{13}$$ Figure 4: Mean waiting time vs. input load in the Generic Multiplane Switch (Single Input Queues). Figure 5: Cell drop probability in the Generic Multiplane Switch (Single Input-Queues). Finding the cell drop probability we adopt a result from the analysis of the $Geom(\lambda)/Geom(q)/1$ queue in [4, Eq. (A.5)] which we write here (after substituting for $q = \frac{1}{|S|} = \frac{2(m-\lambda)}{2m-\lambda}$ $$P_{B_1} = \frac{\left(\lambda^2 - 2(1+m)\lambda + 2m\right) \left(\frac{\lambda^2}{2(1-\lambda)(m-\lambda)}\right)^{K_1}}{2(m-\lambda) - \lambda(2m-\lambda) \left(\frac{\lambda^2}{2(1-\lambda)(m-\lambda)}\right)^{K_1}} \tag{14}$$ $P_{B_2}$ is numerically computed based on Eq. (6) and [4, Eq. (A.10)]. The switch's overall cell drop probability is then given as in Equation (5). We see that also the cell drop probability is independent of n, which is of course consistent with our previous results about the throughput and the mean waiting time. Figs. 5 (a)-(b) show the cell drop probability for different values of m and input - output queue sizes, $K_1$ and $K_2$ respectively (in Fig. 5 (b) two curves are overlapping). We see that as m increases the cell drop probability becomes more sensitive to the output queue size. This is reasonable since the larger the m the more cells are transferred to the output queues. For a constant m the cell drop probability is more sensitive to the input queue size. This is attributed to the acute effect of the HOL blocking phenomenon. We mentioned at the beginning of this section that Scheme A is a special case of the general scheme for n=m. Since the throughput and mean waiting time in the general scheme are independent of n then it is obvious that these are also the same in the scheme A multiplane switch. The cell drop probability is the same as for the generic switch as well. An interesting observation is that this type of multiplane switch has exactly the same throughput as our single-crossbar Multiple Input-Queueing switch [3], in which each input port expands into m queues. The mean waiting times are different though as might be anticipated since the multiplane switch can achieve smaller waiting times due to the output queueing effect. ## 5 Scheme B - Dedicated Access Figure 6: A 4 $\times$ 4 scheme B ATM Switch with m=2 switching planes. As another special case of the generalized scheme in introduced in section 4 we consider the case where n=1. Each switching plane has $\frac{N}{m}$ inputs and thus it operates as an $\frac{N}{m} \times N$ crossbar (with traffic load $\frac{\lambda N}{m}$ ). We refer to this specific multiplane design as scheme B (Fig. 6). Accordingly, the asymptotic saturation throughput for this type of multiplane switch is again $$\gamma_{max} = 1 + m - \sqrt{1 + m^2}$$ It is clear that since scheme B is a special case of the generic multiplane switch for n=1 then the results derived for the mean waiting time and cell loss in the previous section apply here as well. For m=N, Scheme B degenerates into a single crossbar switch with output queues only (no multiple planes or input queues). In this case: $\overline{W} = \overline{W_2} = \frac{\lambda}{2(1-\lambda)}$ and $\gamma_{max} = 1$ . [2] The obvious tradeoff between the different architectures that can emerge from the generic multiplane model (for the different values of n) is the wiring cost (number of crosspoints) vs. the offered redundancy: less planes means less redundancy for a multiplane switch. For instance, we mention that in scheme B the total number of crosspoints is $m\frac{N}{m}N=N^2$ , while scheme A yields $mN^2$ crosspoints; that is m times more than scheme B! In conclusion, we indicate that schemes A and B present the two extremes in terms of redundancy: while scheme A offers a 100% redundancy scheme B has no redundancy at all as each input line can access only one plane. # 6 Scheme C - Parallel, Multiple Input Queues Figure 7: A 4 $\times$ 4 scheme C ATM Switch with m=2 switching planes. In this section we consider multiplane switches where each input port is expanded into more than one input queue. Each input queue is (permanently) connected to one or more switching planes. More specifically we allow $n \ (n \le m)$ queues per input port, where each queue feeds m/n planes. For instance, for m=16, n=4 we have four queues per input and each queue is linked to four planes. An incoming cell chooses randomly which of the n queues to join. We show that the maximum throughput for this generic type of multiplane switch is: $$\gamma = n + m - \sqrt{n^2 + m^2} \tag{15}$$ Again, we consider the mean waiting time for a typical input queue which is obtained from Eq. (3) after substituting $\lambda/n$ for $\lambda$ , since the arrival rate to each input queue is now $\lambda/n$ (i.e., $Geom(\lambda/n)/G/1$ ): $$\overline{W_1} = \overline{S} + \frac{\lambda(\overline{S^2} - \overline{S})}{2(n - \lambda \overline{S})} \tag{16}$$ <sup>&</sup>lt;sup>6</sup>Note that this stochastic process is independent of m. $<sup>^7</sup>$ Without any loss of generality we assume that n divides exactly m . We can show that (for $N \to \infty$ ) the distribution of $A_{j,k}$ is again a Poisson with parameter $\lambda/m$ . Thus, $\overline{S}$ and $\overline{S^2}$ are given by Eqs. (9) and (10) respectively, since the service time in the $Geom(\lambda/n)/G/1$ queue is the same as the total delay in a discrete $M(\lambda/m)/D/1$ queueing system. Applying the same argument about the maximum throughput as in scheme A, we set the denominator of Eq. (16) equal to zero and then solve for $\lambda$ : $$\lambda = \frac{n}{\overline{S}} = \frac{2n(m-\lambda)}{2m-\lambda}$$ yielding $$\gamma_{max} = n + m - \sqrt{n^2 + m^2} \tag{17}$$ From Eq. (17) we readily have that $\lim_{m\to\infty} \gamma_{max} = n$ . We can write the maximum throughput as $\gamma_{max} = n \left(1 + \frac{m}{n} - \sqrt{1 + \left(\frac{m}{n}\right)^2}\right)$ which means that it is n times the maximum throughput of a (scheme A) multiplane switch with m/n independent planes. We then immediately recognize that we could have arrived at Eq. (17) by just considering Eq. (10) for m/n planes. From Eq. (17) it is clear that at saturation $\gamma_{max} = 1$ for n > 1. In other words, a multiplane switch with n = 2 (input queues) and m = 2 (switching planes) is the minimum required configuration in order to achieve a 100% throughput. Fig. 7 shows such a switch. We refer to the special case of multiplane switch where n = m, as scheme C. In scheme C each plane has effectively each own dedicated queue on every input. Figure 8: Mean waiting time vs. input load in the Generic Multiplane Switch (Multiple Input Queues). With respect to the mean waiting time we have that Eq. (16) yields $$\overline{W_1} = \frac{11\lambda^4 - (32m + 6n)\lambda^3 + (48m + 30n)m\lambda^2 - 6(m - \lambda)(2m - \lambda)}{6(m + 48n)m^2\lambda + 24m^3n}$$ $$\frac{-(24m + 48n)m^2\lambda + 24m^3n}{(\lambda^2 - 2(m + n)\lambda + 2mn)}$$ $\overline{W_2}$ is given by Equation (4) where $$\gamma = \begin{cases} \lambda, & \lambda < \gamma_{max} \\ n + m - \sqrt{n^2 + m^2}, & \lambda \ge \gamma_{max} \end{cases}$$ (18) As $m \to \infty$ then $\lim_{m \to \infty} \overline{W_1} = 1$ and $\lim_{m \to \infty} \overline{W_2} = \frac{\lambda}{2(1-\lambda)}$ , $$\lim_{m \to \infty} \overline{W} = \lim_{m \to \infty} (\overline{W_1} + \overline{W_2}) = \frac{2 - \lambda}{2(1 - \lambda)}$$ (19) which is independent of n. In fact we notice that Eq. (19) is the same as Eq. (13). Figure 9: Cell drop probability in the Generic Multiplane Switch (Multiple Input-Queues). Figs. 8 (a)-(b) show the mean waiting times of multiplane switches for various values of m and n. It is noteworthy mentioning that for n > 1 the n-curves collapse into a single curve. This is, in a way, consistent with our observation regarding throughput, which leads us to the conclusion that n=2 is sufficient for implementing the type of multiplane switch we described in this section. For the various values of m we see that m=4 qualifies as a good choice for m. As far as the cell drop probability is concerned we can write from Equation (A.15), where now an input queue is modeled as a $Geom(\lambda/n)/Geom(q)/1$ queue, that $$P_{B_1} = \frac{(\lambda^2 - 2(n+m)\lambda + 2nm) \left(\frac{\lambda^2}{2(n-\lambda)(m-\lambda)}\right)^{\frac{K_1}{n}}}{2n(m-\lambda) - \lambda(2m-\lambda) \left(\frac{\lambda^2}{2(n-\lambda)(m-\lambda)}\right)^{\frac{K_1}{n}}} (20)$$ since $q = \frac{1}{\overline{S}} = \frac{2(m-\lambda)}{2m-\lambda}$ . Note that, for fairness (i.e., maintain the same total buffer space), the buffer capacity per input port, namely $K_1$ , is equally allocated to the n queues, that is each input-queue size is $K_1/n$ . Equation (20) combined with Eqs. (6) and [4, Eq. (A.10)] gives the cell drop probability as described in Eq. (5). In Fig. 9 we illustrate the various tradeoffs that arise using different values of m, n and queue sizes $(K_1, K_2)$ . Concluding this section we mention that for n = 1 we readily obtain scheme A. ## 7 Scheduling Considerations We recognize that the performance of the aforementioned multiplane switches can be considerably improved if we take into account special scheduling as to which planes the cells should be transferred through in order to minimize the output contention (and effectively the HOL blocking). An example of a scheduling algorithm is one that employs a knockout arbitration scheme: all HOL cells destined for the same output try to access the first plane; those that loose the contention try then the second and so on. This assumes a multiple contention resolution phase within the same timeslot. In another similar scheduling example HOL requests to the same output destination could be queued up in a (virtual) queue where each would be then served by a different plane thus trying to minimize the output contention within the same crossbar switching plane. These scheduling algorithms result into a performance identical to the one achieved by a speedup switch with a speedup factor of m.[7] It is apparent that in both cases all up to m HOL cells destined for the same output are guaranteed to be transferred to that output. ### 8 Conclusion Multiplane switches are architecturally appealing as they are simple designs. They are cost effective and practical solutions as they can combine two or more switching fabrics within the same switch box. We studied the performance of various multiplane designs with respect to their throughput, mean waiting time and cell drop statistics. Note that from the mean waiting times one can easily derive the maximum throughput of the switch in question. The switches under consideration had common output queues and common or separate input queues (schemes A-B and C respectively). In the case of separate input queues we can avoid any sequencing problems by forcing cells that belong to the same traffic stream (i.e., VC) to be queued in the same input thus avoiding having them routed through different planes. In general, separate queues can be used to implement traffic streams with priorities or with individual traffic requirements (i.e., as to guarantee the promised QoS) or according to the cell's destination. Moreover, we demonstrated the tradeoff between redundancy and wiring cost (schemes A and B) whereas the performance is not affected whatsoever. In addition, we showed that we can further improve the performance of multiplane switches by not only looking at their throughput but also at their mean waiting time and cell drop probability metrics. #### References - [1] J. Y. Hui and E. Arthurs, "A Broadband Packet Switch for Integrated Transport", *IEEE J. Selected* Areas Commun., **SAC-5**(8), pp. 1264-1273, Oct. 1987. - [2] M. J. Karol, M. G. Hluchyj and S. P. Morgan, "Input versus Output Queueing on a Space-Division Packet Switch", *IEEE Trans. Commun.*, COM-35(12), pp. 1347-1356, Dec. 1987. - [3] C. Kolias and L. Kleinrock, "Performance Analysis of Variations on Input-Queueing for ATM Switches", submitted for review. - [4] C. Kolias and L. Kleinrock, "Performance Analysis of Multiplane, Nonblocking ATM Switches", Technical Report, Computer Science Department, UCLA, July 1998. http://millennium.cs.ucla.edu/~ck/tr1.ps - [5] C. Kolias and L. Kleinrock, "An Analytically Simplified Study of ATM Switches with Input-Queueing", submitted for review. - [6] Y. Oie, T. Suda, M. Murata, D. Kolson and H. Miyahara, "Survey of Switching Techniques in High-Speed Networks and Their Performance", in *Proc. IEEE INFOCOM '90*, pp. 1242-1251, San Francisco, CA, June 1990. - [7] Y. Oie, M. Murata, K. Kubota and H. Miyahara, "Effect of Speedup in Nonblocking Packet Switch", in *Proc. IEEE ICC '89*, pp. 410-414, Boston, MA, June 1989. - [8] H. Takagi, Queueing Analysis, A Foundation of Performance Evaluation, Volume 3: Discrete-Time Systems, Elsevier Science Publishers B.V., Amsterdam, The Netherlands (1993).